OCR Test - Works offline

Robert Theis程式庫與試用程式2017-08-16

價格：免費

更新日期：2017-08-16

檔案大小：10.0M

目前版本：0.6.0

版本需求：Android 2.3 以上版本

官方網站：http://www.rmtheis.com

Email：robert.m.theis@gmail.com

聯絡地址：San Luis Obispo, CA 93401 USA

Experimental app for optical character recognition (OCR)

This app is an experimental app that I developed several years ago that demonstrates use of the Tesseract OCR engine to recognize text in images captured by the device camera.

This app runs OCR on your device – without uploading your images to a server – and is suitable for recognizing individual words or short phrases of text, but this app is intended for hobbyists and software developers interested in OCR and not for general audiences.

In contrast to Google's Mobile Vision API, this app is able to recognize text printed in non-Latin-based fonts while offline. To achieve this, this app incorporates an unusually large amount of training data for several languages. This training data is stored on your phone, and this app takes up much more space than ordinary apps.

No image pre-processing is performed by this app before handing off captured image frames to Tesseract, so the app is not tuned for any specific use case and, as a result, its recognition accuracy and speed is heavily dependent on situational factors like perspective, lighting, and font type.

Source code for this app is available on GitHub (with minor changes to accommodate GitHub file size restrictions). The code for this app is a combination of open source camera-related code from the ZXing bar code scanner project and open source optical character recognition code from the Tesseract OCR project.

TEXT CAPTURE

The default single-shot capture runs OCR on a snapshot image that's captured when you click the shutter button, like a regular photo.

When the "continuous preview" checkbox is checked, the app shows a dynamic, real-time display of what the device is recognizing right beside the camera viewfinder. The continuous preview mode works best on a fast device.

USING THIS APP

• Point the device at a small region of text and touch the on-screen shutter button to start OCR.

• For recognizing individual Chinese/Japanese/Korean characters, set the page segmentation mode to "single character."

RECOGNITION ACCURACY

• Various factors can cause the OCR to fail: uneven illumination, stylized text, or text without enough contrast from the background. Try to have good lighting.

• Hold the device steady, and be sure the picture is in focus.

• If you need to scan a large block of text or an entire document, try a document scanning app such as Text Fairy instead.

LANGUAGES

• This app supports several languages/scripts not recognized by Google Translate.

• Supported languages for OCR:

Afrikaans

Albanian

Amharic

Arabic

Assamese

Azerbaijani

Azerbaijani (Cyrillic)

Basque

Belarusian

Bengali

Bosnian

Bulgarian

Burmese

Catalan

Cebuano

Cherokee

Chinese (Simplified)

Chinese (Traditional)

Croatian

Czech

Danish

Dutch

Dzongkha

English

English, Middle (1100-1500)

Esperanto

Estonian

Finnish

Frankish

French

French, Middle (ca. 1400-1600)

Galician

Georgian

Georgian - Old

German

Greek, Ancient (-1453)

Greek, Modern (1453-)

Gujarati

Haitian

Hebrew

Hindi

Hungarian

Icelandic

Indonesian

Inuktitut

Irish

Italian

Italian - Old

Japanese

Javanese

Kannada

Kazakh

Khmer

Korean

Kurdish

Kyrgyz

Lao

Latin

Latvian

Lithuanian

Macedonian

Malay

Malayalam

Maltese

Marathi

Nepali

Norwegian

Oriya

Pashto

Persian

Polish

Portuguese

Punjabi

Romanian

Russian

Sanskrit

Serbian

Serbian (Latin)

Sinhala

Slovak

Slovenian

Spanish

Spanish - Old

Swahili

Swedish

Syriac

Tagalog

Tajik

Tamil

Telugu

Thai

Tibetan

Tigrinya

Turkish

Ukrainian

Urdu

Uyghur

Uzbek

Uzbek (Cyrillic)

Vietnamese

Welsh

Yiddish

SAMSUNG DEVICE NOTES

• On Samsung Galaxy devices, you may need to long-press the menu button to set preferences.

• You may get better results if you un-check "Standard focus mode".